{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "## Langchain & Llamaindex RAG comparison"
   ],
   "metadata": {
    "id": "wtb0XAiUCO9W"
   }
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7wD8dJo-WZH7"
   },
   "source": [
    "This notebook compares Langchain & Llamaindex for understand which method is best extraction of table & text from PDF in the following\n",
    "\n",
    "\n",
    "Here we have covered\n",
    "\n",
    "1. Langchain RAG\n",
    "2. Llamaindex RAG\n",
    "3. Langchain wiht llamaparser\n",
    "4. Llamaindex with llamaparser\n",
    "\n",
    "\n",
    "from above this method will get idea about which is best method for table extraction for the following data used\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "HGcMcXLF7zoM",
    "outputId": "b4f6876b-0531-4bce-c1a6-1615c77322a2"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting llama-index\n",
      "  Downloading llama_index-0.10.37-py3-none-any.whl (6.8 kB)\n",
      "Collecting llama-index-core\n",
      "  Downloading llama_index_core-0.10.37.post1-py3-none-any.whl (15.4 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m15.4/15.4 MB\u001b[0m \u001b[31m40.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting llama-index-embeddings-openai\n",
      "  Downloading llama_index_embeddings_openai-0.1.9-py3-none-any.whl (6.0 kB)\n",
      "Collecting llama-parse\n",
      "  Downloading llama_parse-0.4.3-py3-none-any.whl (7.7 kB)\n",
      "Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index)\n",
      "  Downloading llama_index_agent_openai-0.2.5-py3-none-any.whl (13 kB)\n",
      "Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)\n",
      "  Downloading llama_index_cli-0.1.12-py3-none-any.whl (26 kB)\n",
      "Collecting llama-index-indices-managed-llama-cloud<0.2.0,>=0.1.2 (from llama-index)\n",
      "  Downloading llama_index_indices_managed_llama_cloud-0.1.6-py3-none-any.whl (6.7 kB)\n",
      "Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)\n",
      "  Downloading llama_index_legacy-0.9.48-py3-none-any.whl (2.0 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m52.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting llama-index-llms-openai<0.2.0,>=0.1.13 (from llama-index)\n",
      "  Downloading llama_index_llms_openai-0.1.19-py3-none-any.whl (11 kB)\n",
      "Collecting llama-index-multi-modal-llms-openai<0.2.0,>=0.1.3 (from llama-index)\n",
      "  Downloading llama_index_multi_modal_llms_openai-0.1.6-py3-none-any.whl (5.8 kB)\n",
      "Collecting llama-index-program-openai<0.2.0,>=0.1.3 (from llama-index)\n",
      "  Downloading llama_index_program_openai-0.1.6-py3-none-any.whl (5.2 kB)\n",
      "Collecting llama-index-question-gen-openai<0.2.0,>=0.1.2 (from llama-index)\n",
      "  Downloading llama_index_question_gen_openai-0.1.3-py3-none-any.whl (2.9 kB)\n",
      "Collecting llama-index-readers-file<0.2.0,>=0.1.4 (from llama-index)\n",
      "  Downloading llama_index_readers_file-0.1.22-py3-none-any.whl (36 kB)\n",
      "Collecting llama-index-readers-llama-parse<0.2.0,>=0.1.2 (from llama-index)\n",
      "  Downloading llama_index_readers_llama_parse-0.1.4-py3-none-any.whl (2.5 kB)\n",
      "Requirement already satisfied: PyYAML>=6.0.1 in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (6.0.1)\n",
      "Requirement already satisfied: SQLAlchemy[asyncio]>=1.4.49 in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (2.0.30)\n",
      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.6 in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (3.9.5)\n",
      "Collecting dataclasses-json (from llama-index-core)\n",
      "  Downloading dataclasses_json-0.6.6-py3-none-any.whl (28 kB)\n",
      "Collecting deprecated>=1.2.9.3 (from llama-index-core)\n",
      "  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)\n",
      "Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-index-core)\n",
      "  Downloading dirtyjson-1.0.8-py3-none-any.whl (25 kB)\n",
      "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (2023.6.0)\n",
      "Collecting httpx (from llama-index-core)\n",
      "  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m9.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting jsonpath-ng<2.0.0,>=1.6.0 (from llama-index-core)\n",
      "  Downloading jsonpath_ng-1.6.1-py3-none-any.whl (29 kB)\n",
      "Collecting llamaindex-py-client<0.2.0,>=0.1.18 (from llama-index-core)\n",
      "  Downloading llamaindex_py_client-0.1.19-py3-none-any.whl (141 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m141.9/141.9 kB\u001b[0m \u001b[31m14.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: nest-asyncio<2.0.0,>=1.5.8 in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (1.6.0)\n",
      "Requirement already satisfied: networkx>=3.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (3.3)\n",
      "Requirement already satisfied: nltk<4.0.0,>=3.8.1 in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (3.8.1)\n",
      "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (1.25.2)\n",
      "Collecting openai>=1.1.0 (from llama-index-core)\n",
      "  Downloading openai-1.30.1-py3-none-any.whl (320 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m320.6/320.6 kB\u001b[0m \u001b[31m26.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (2.0.3)\n",
      "Requirement already satisfied: pillow>=9.0.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (9.4.0)\n",
      "Requirement already satisfied: requests>=2.31.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (2.31.0)\n",
      "Requirement already satisfied: spacy<4.0.0,>=3.7.1 in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (3.7.4)\n",
      "Requirement already satisfied: tenacity<9.0.0,>=8.2.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (8.3.0)\n",
      "Collecting tiktoken>=0.3.3 (from llama-index-core)\n",
      "  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m54.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: tqdm<5.0.0,>=4.66.1 in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (4.66.4)\n",
      "Requirement already satisfied: typing-extensions>=4.5.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (4.11.0)\n",
      "Collecting typing-inspect>=0.8.0 (from llama-index-core)\n",
      "  Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\n",
      "Requirement already satisfied: wrapt in /usr/local/lib/python3.10/dist-packages (from llama-index-core) (1.14.1)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core) (1.3.1)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core) (23.2.0)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core) (1.4.1)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core) (6.0.5)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core) (1.9.4)\n",
      "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core) (4.0.3)\n",
      "Collecting ply (from jsonpath-ng<2.0.0,>=1.6.0->llama-index-core)\n",
      "  Downloading ply-3.11-py2.py3-none-any.whl (49 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.6/49.6 kB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: beautifulsoup4<5.0.0,>=4.12.3 in /usr/local/lib/python3.10/dist-packages (from llama-index-readers-file<0.2.0,>=0.1.4->llama-index) (4.12.3)\n",
      "Collecting pypdf<5.0.0,>=4.0.1 (from llama-index-readers-file<0.2.0,>=0.1.4->llama-index)\n",
      "  Downloading pypdf-4.2.0-py3-none-any.whl (290 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m290.4/290.4 kB\u001b[0m \u001b[31m27.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting striprtf<0.0.27,>=0.0.26 (from llama-index-readers-file<0.2.0,>=0.1.4->llama-index)\n",
      "  Downloading striprtf-0.0.26-py3-none-any.whl (6.9 kB)\n",
      "Requirement already satisfied: pydantic>=1.10 in /usr/local/lib/python3.10/dist-packages (from llamaindex-py-client<0.2.0,>=0.1.18->llama-index-core) (2.7.1)\n",
      "Requirement already satisfied: anyio in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core) (3.7.1)\n",
      "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core) (2024.2.2)\n",
      "Collecting httpcore==1.* (from httpx->llama-index-core)\n",
      "  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m10.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: idna in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core) (3.7)\n",
      "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core) (1.3.1)\n",
      "Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx->llama-index-core)\n",
      "  Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m6.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk<4.0.0,>=3.8.1->llama-index-core) (8.1.7)\n",
      "Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk<4.0.0,>=3.8.1->llama-index-core) (1.4.2)\n",
      "Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk<4.0.0,>=3.8.1->llama-index-core) (2023.12.25)\n",
      "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai>=1.1.0->llama-index-core) (1.7.0)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->llama-index-core) (3.3.2)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->llama-index-core) (2.0.7)\n",
      "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (3.0.12)\n",
      "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (1.0.5)\n",
      "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (1.0.10)\n",
      "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (2.0.8)\n",
      "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (3.0.9)\n",
      "Requirement already satisfied: thinc<8.3.0,>=8.2.2 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (8.2.3)\n",
      "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (1.1.2)\n",
      "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (2.4.8)\n",
      "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (2.0.10)\n",
      "Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (0.3.4)\n",
      "Requirement already satisfied: typer<0.10.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (0.9.4)\n",
      "Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (6.4.0)\n",
      "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (3.1.4)\n",
      "Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (67.7.2)\n",
      "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (24.0)\n",
      "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core) (3.4.0)\n",
      "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy[asyncio]>=1.4.49->llama-index-core) (3.0.3)\n",
      "Collecting mypy-extensions>=0.3.0 (from typing-inspect>=0.8.0->llama-index-core)\n",
      "  Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n",
      "Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json->llama-index-core)\n",
      "  Downloading marshmallow-3.21.2-py3-none-any.whl (49 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.3/49.3 kB\u001b[0m \u001b[31m4.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->llama-index-core) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->llama-index-core) (2023.4)\n",
      "Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->llama-index-core) (2024.1)\n",
      "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx->llama-index-core) (1.2.1)\n",
      "Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4<5.0.0,>=4.12.3->llama-index-readers-file<0.2.0,>=0.1.4->llama-index) (2.5)\n",
      "Requirement already satisfied: language-data>=1.2 in /usr/local/lib/python3.10/dist-packages (from langcodes<4.0.0,>=3.2.0->spacy<4.0.0,>=3.7.1->llama-index-core) (1.2.0)\n",
      "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->llamaindex-py-client<0.2.0,>=0.1.18->llama-index-core) (0.6.0)\n",
      "Requirement already satisfied: pydantic-core==2.18.2 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->llamaindex-py-client<0.2.0,>=0.1.18->llama-index-core) (2.18.2)\n",
      "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas->llama-index-core) (1.16.0)\n",
      "Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.10/dist-packages (from thinc<8.3.0,>=8.2.2->spacy<4.0.0,>=3.7.1->llama-index-core) (0.7.11)\n",
      "Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from thinc<8.3.0,>=8.2.2->spacy<4.0.0,>=3.7.1->llama-index-core) (0.1.4)\n",
      "Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from weasel<0.4.0,>=0.1.0->spacy<4.0.0,>=3.7.1->llama-index-core) (0.16.0)\n",
      "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->spacy<4.0.0,>=3.7.1->llama-index-core) (2.1.5)\n",
      "Requirement already satisfied: marisa-trie>=0.7.7 in /usr/local/lib/python3.10/dist-packages (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spacy<4.0.0,>=3.7.1->llama-index-core) (1.1.1)\n",
      "Installing collected packages: striprtf, ply, dirtyjson, pypdf, mypy-extensions, marshmallow, jsonpath-ng, h11, deprecated, typing-inspect, tiktoken, httpcore, httpx, dataclasses-json, openai, llamaindex-py-client, llama-index-legacy, llama-index-core, llama-parse, llama-index-readers-file, llama-index-llms-openai, llama-index-indices-managed-llama-cloud, llama-index-embeddings-openai, llama-index-readers-llama-parse, llama-index-multi-modal-llms-openai, llama-index-cli, llama-index-agent-openai, llama-index-program-openai, llama-index-question-gen-openai, llama-index\n",
      "Successfully installed dataclasses-json-0.6.6 deprecated-1.2.14 dirtyjson-1.0.8 h11-0.14.0 httpcore-1.0.5 httpx-0.27.0 jsonpath-ng-1.6.1 llama-index-0.10.37 llama-index-agent-openai-0.2.5 llama-index-cli-0.1.12 llama-index-core-0.10.37.post1 llama-index-embeddings-openai-0.1.9 llama-index-indices-managed-llama-cloud-0.1.6 llama-index-legacy-0.9.48 llama-index-llms-openai-0.1.19 llama-index-multi-modal-llms-openai-0.1.6 llama-index-program-openai-0.1.6 llama-index-question-gen-openai-0.1.3 llama-index-readers-file-0.1.22 llama-index-readers-llama-parse-0.1.4 llama-parse-0.4.3 llamaindex-py-client-0.1.19 marshmallow-3.21.2 mypy-extensions-1.0.0 openai-1.30.1 ply-3.11 pypdf-4.2.0 striprtf-0.0.26 tiktoken-0.7.0 typing-inspect-0.9.0\n",
      "Collecting llama-index-postprocessor-flag-embedding-reranker\n",
      "  Downloading llama_index_postprocessor_flag_embedding_reranker-0.1.3-py3-none-any.whl (3.0 kB)\n",
      "Requirement already satisfied: llama-index-core<0.11.0,>=0.10.35 in /usr/local/lib/python3.10/dist-packages (from llama-index-postprocessor-flag-embedding-reranker) (0.10.37.post1)\n",
      "Requirement already satisfied: PyYAML>=6.0.1 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (6.0.1)\n",
      "Requirement already satisfied: SQLAlchemy[asyncio]>=1.4.49 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2.0.30)\n",
      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.6 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.9.5)\n",
      "Requirement already satisfied: dataclasses-json in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (0.6.6)\n",
      "Requirement already satisfied: deprecated>=1.2.9.3 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.2.14)\n",
      "Requirement already satisfied: dirtyjson<2.0.0,>=1.0.8 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.0.8)\n",
      "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2023.6.0)\n",
      "Requirement already satisfied: httpx in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (0.27.0)\n",
      "Requirement already satisfied: jsonpath-ng<2.0.0,>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.6.1)\n",
      "Requirement already satisfied: llamaindex-py-client<0.2.0,>=0.1.18 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (0.1.19)\n",
      "Requirement already satisfied: nest-asyncio<2.0.0,>=1.5.8 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.6.0)\n",
      "Requirement already satisfied: networkx>=3.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.3)\n",
      "Requirement already satisfied: nltk<4.0.0,>=3.8.1 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.8.1)\n",
      "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.25.2)\n",
      "Requirement already satisfied: openai>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.30.1)\n",
      "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2.0.3)\n",
      "Requirement already satisfied: pillow>=9.0.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (9.4.0)\n",
      "Requirement already satisfied: requests>=2.31.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2.31.0)\n",
      "Requirement already satisfied: spacy<4.0.0,>=3.7.1 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.7.4)\n",
      "Requirement already satisfied: tenacity<9.0.0,>=8.2.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (8.3.0)\n",
      "Requirement already satisfied: tiktoken>=0.3.3 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (0.7.0)\n",
      "Requirement already satisfied: tqdm<5.0.0,>=4.66.1 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (4.66.4)\n",
      "Requirement already satisfied: typing-extensions>=4.5.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (4.11.0)\n",
      "Requirement already satisfied: typing-inspect>=0.8.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (0.9.0)\n",
      "Requirement already satisfied: wrapt in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.14.1)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.3.1)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (23.2.0)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.4.1)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (6.0.5)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.9.4)\n",
      "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (4.0.3)\n",
      "Requirement already satisfied: ply in /usr/local/lib/python3.10/dist-packages (from jsonpath-ng<2.0.0,>=1.6.0->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.11)\n",
      "Requirement already satisfied: pydantic>=1.10 in /usr/local/lib/python3.10/dist-packages (from llamaindex-py-client<0.2.0,>=0.1.18->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2.7.1)\n",
      "Requirement already satisfied: anyio in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.7.1)\n",
      "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2024.2.2)\n",
      "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.0.5)\n",
      "Requirement already satisfied: idna in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.7)\n",
      "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.3.1)\n",
      "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (0.14.0)\n",
      "Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk<4.0.0,>=3.8.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (8.1.7)\n",
      "Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk<4.0.0,>=3.8.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.4.2)\n",
      "Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk<4.0.0,>=3.8.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2023.12.25)\n",
      "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai>=1.1.0->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.7.0)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.3.2)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2.0.7)\n",
      "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.0.12)\n",
      "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.0.5)\n",
      "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.0.10)\n",
      "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2.0.8)\n",
      "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.0.9)\n",
      "Requirement already satisfied: thinc<8.3.0,>=8.2.2 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (8.2.3)\n",
      "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.1.2)\n",
      "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2.4.8)\n",
      "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2.0.10)\n",
      "Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (0.3.4)\n",
      "Requirement already satisfied: typer<0.10.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (0.9.4)\n",
      "Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (6.4.0)\n",
      "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.1.4)\n",
      "Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (67.7.2)\n",
      "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (24.0)\n",
      "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.4.0)\n",
      "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.0.3)\n",
      "Requirement already satisfied: mypy-extensions>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from typing-inspect>=0.8.0->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.0.0)\n",
      "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /usr/local/lib/python3.10/dist-packages (from dataclasses-json->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (3.21.2)\n",
      "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2023.4)\n",
      "Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2024.1)\n",
      "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.2.1)\n",
      "Requirement already satisfied: language-data>=1.2 in /usr/local/lib/python3.10/dist-packages (from langcodes<4.0.0,>=3.2.0->spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.2.0)\n",
      "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->llamaindex-py-client<0.2.0,>=0.1.18->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (0.6.0)\n",
      "Requirement already satisfied: pydantic-core==2.18.2 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->llamaindex-py-client<0.2.0,>=0.1.18->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2.18.2)\n",
      "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.16.0)\n",
      "Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.10/dist-packages (from thinc<8.3.0,>=8.2.2->spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (0.7.11)\n",
      "Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from thinc<8.3.0,>=8.2.2->spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (0.1.4)\n",
      "Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from weasel<0.4.0,>=0.1.0->spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (0.16.0)\n",
      "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (2.1.5)\n",
      "Requirement already satisfied: marisa-trie>=0.7.7 in /usr/local/lib/python3.10/dist-packages (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.35->llama-index-postprocessor-flag-embedding-reranker) (1.1.1)\n",
      "Installing collected packages: llama-index-postprocessor-flag-embedding-reranker\n",
      "Successfully installed llama-index-postprocessor-flag-embedding-reranker-0.1.3\n",
      "Collecting git+https://github.com/FlagOpen/FlagEmbedding.git\n",
      "  Cloning https://github.com/FlagOpen/FlagEmbedding.git to /tmp/pip-req-build-wmws0zv2\n",
      "  Running command git clone --filter=blob:none --quiet https://github.com/FlagOpen/FlagEmbedding.git /tmp/pip-req-build-wmws0zv2\n",
      "  Resolved https://github.com/FlagOpen/FlagEmbedding.git to commit 95b873d9ac923bca47436efeae39ca4559970210\n",
      "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "Requirement already satisfied: torch>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from FlagEmbedding==1.2.9) (2.2.1+cu121)\n",
      "Requirement already satisfied: transformers>=4.33.0 in /usr/local/lib/python3.10/dist-packages (from FlagEmbedding==1.2.9) (4.40.2)\n",
      "Collecting datasets (from FlagEmbedding==1.2.9)\n",
      "  Downloading datasets-2.19.1-py3-none-any.whl (542 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m542.0/542.0 kB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting accelerate>=0.20.1 (from FlagEmbedding==1.2.9)\n",
      "  Downloading accelerate-0.30.1-py3-none-any.whl (302 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m302.6/302.6 kB\u001b[0m \u001b[31m10.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting sentence_transformers (from FlagEmbedding==1.2.9)\n",
      "  Downloading sentence_transformers-2.7.0-py3-none-any.whl (171 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m171.5/171.5 kB\u001b[0m \u001b[31m9.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.20.1->FlagEmbedding==1.2.9) (1.25.2)\n",
      "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.20.1->FlagEmbedding==1.2.9) (24.0)\n",
      "Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.20.1->FlagEmbedding==1.2.9) (5.9.5)\n",
      "Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.20.1->FlagEmbedding==1.2.9) (6.0.1)\n",
      "Requirement already satisfied: huggingface-hub in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.20.1->FlagEmbedding==1.2.9) (0.20.3)\n",
      "Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.20.1->FlagEmbedding==1.2.9) (0.4.3)\n",
      "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->FlagEmbedding==1.2.9) (3.14.0)\n",
      "Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->FlagEmbedding==1.2.9) (4.11.0)\n",
      "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->FlagEmbedding==1.2.9) (1.12)\n",
      "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->FlagEmbedding==1.2.9) (3.3)\n",
      "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->FlagEmbedding==1.2.9) (3.1.4)\n",
      "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->FlagEmbedding==1.2.9) (2023.6.0)\n",
      "Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.6.0->FlagEmbedding==1.2.9)\n",
      "  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)\n",
      "Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.6.0->FlagEmbedding==1.2.9)\n",
      "  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)\n",
      "Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.6.0->FlagEmbedding==1.2.9)\n",
      "  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)\n",
      "Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.6.0->FlagEmbedding==1.2.9)\n",
      "  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)\n",
      "Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.6.0->FlagEmbedding==1.2.9)\n",
      "  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)\n",
      "Collecting nvidia-cufft-cu12==11.0.2.54 (from torch>=1.6.0->FlagEmbedding==1.2.9)\n",
      "  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)\n",
      "Collecting nvidia-curand-cu12==10.3.2.106 (from torch>=1.6.0->FlagEmbedding==1.2.9)\n",
      "  Using cached nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)\n",
      "Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch>=1.6.0->FlagEmbedding==1.2.9)\n",
      "  Using cached nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)\n",
      "Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch>=1.6.0->FlagEmbedding==1.2.9)\n",
      "  Using cached nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)\n",
      "Collecting nvidia-nccl-cu12==2.19.3 (from torch>=1.6.0->FlagEmbedding==1.2.9)\n",
      "  Using cached nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB)\n",
      "Collecting nvidia-nvtx-cu12==12.1.105 (from torch>=1.6.0->FlagEmbedding==1.2.9)\n",
      "  Using cached nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)\n",
      "Requirement already satisfied: triton==2.2.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->FlagEmbedding==1.2.9) (2.2.0)\n",
      "Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch>=1.6.0->FlagEmbedding==1.2.9)\n",
      "  Using cached nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)\n",
      "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.33.0->FlagEmbedding==1.2.9) (2023.12.25)\n",
      "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers>=4.33.0->FlagEmbedding==1.2.9) (2.31.0)\n",
      "Requirement already satisfied: tokenizers<0.20,>=0.19 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.33.0->FlagEmbedding==1.2.9) (0.19.1)\n",
      "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.33.0->FlagEmbedding==1.2.9) (4.66.4)\n",
      "Requirement already satisfied: pyarrow>=12.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets->FlagEmbedding==1.2.9) (14.0.2)\n",
      "Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.10/dist-packages (from datasets->FlagEmbedding==1.2.9) (0.6)\n",
      "Collecting dill<0.3.9,>=0.3.0 (from datasets->FlagEmbedding==1.2.9)\n",
      "  Downloading dill-0.3.8-py3-none-any.whl (116 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m116.3/116.3 kB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets->FlagEmbedding==1.2.9) (2.0.3)\n",
      "Collecting xxhash (from datasets->FlagEmbedding==1.2.9)\n",
      "  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m11.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting multiprocess (from datasets->FlagEmbedding==1.2.9)\n",
      "  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m16.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets->FlagEmbedding==1.2.9) (3.9.5)\n",
      "Collecting huggingface-hub (from accelerate>=0.20.1->FlagEmbedding==1.2.9)\n",
      "  Downloading huggingface_hub-0.23.0-py3-none-any.whl (401 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m401.2/401.2 kB\u001b[0m \u001b[31m14.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from sentence_transformers->FlagEmbedding==1.2.9) (1.2.2)\n",
      "Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from sentence_transformers->FlagEmbedding==1.2.9) (1.11.4)\n",
      "Requirement already satisfied: Pillow in /usr/local/lib/python3.10/dist-packages (from sentence_transformers->FlagEmbedding==1.2.9) (9.4.0)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->FlagEmbedding==1.2.9) (1.3.1)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->FlagEmbedding==1.2.9) (23.2.0)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->FlagEmbedding==1.2.9) (1.4.1)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->FlagEmbedding==1.2.9) (6.0.5)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->FlagEmbedding==1.2.9) (1.9.4)\n",
      "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->FlagEmbedding==1.2.9) (4.0.3)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.33.0->FlagEmbedding==1.2.9) (3.3.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.33.0->FlagEmbedding==1.2.9) (3.7)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.33.0->FlagEmbedding==1.2.9) (2.0.7)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.33.0->FlagEmbedding==1.2.9) (2024.2.2)\n",
      "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.6.0->FlagEmbedding==1.2.9) (2.1.5)\n",
      "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets->FlagEmbedding==1.2.9) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets->FlagEmbedding==1.2.9) (2023.4)\n",
      "Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets->FlagEmbedding==1.2.9) (2024.1)\n",
      "Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->sentence_transformers->FlagEmbedding==1.2.9) (1.4.2)\n",
      "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->sentence_transformers->FlagEmbedding==1.2.9) (3.5.0)\n",
      "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.6.0->FlagEmbedding==1.2.9) (1.3.0)\n",
      "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas->datasets->FlagEmbedding==1.2.9) (1.16.0)\n",
      "Building wheels for collected packages: FlagEmbedding\n",
      "  Building wheel for FlagEmbedding (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "  Created wheel for FlagEmbedding: filename=FlagEmbedding-1.2.9-py3-none-any.whl size=165917 sha256=24688c17b3bc6214be93c7bef77d4b9baacde749336b83f600188a704c6d8cad\n",
      "  Stored in directory: /tmp/pip-ephem-wheel-cache-45wml86h/wheels/41/cf/a5/5dee96ed64e5aaffe5aa3d583828258fdefed9a305db6e7f48\n",
      "Successfully built FlagEmbedding\n",
      "Installing collected packages: xxhash, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, dill, nvidia-cusparse-cu12, nvidia-cudnn-cu12, multiprocess, huggingface-hub, nvidia-cusolver-cu12, datasets, sentence_transformers, accelerate, FlagEmbedding\n",
      "  Attempting uninstall: huggingface-hub\n",
      "    Found existing installation: huggingface-hub 0.20.3\n",
      "    Uninstalling huggingface-hub-0.20.3:\n",
      "      Successfully uninstalled huggingface-hub-0.20.3\n",
      "Successfully installed FlagEmbedding-1.2.9 accelerate-0.30.1 datasets-2.19.1 dill-0.3.8 huggingface-hub-0.23.0 multiprocess-0.70.16 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.19.3 nvidia-nvjitlink-cu12-12.4.127 nvidia-nvtx-cu12-12.1.105 sentence_transformers-2.7.0 xxhash-3.4.1\n",
      "Collecting llama-index-vector-stores-lancedb\n",
      "  Downloading llama_index_vector_stores_lancedb-0.1.3-py3-none-any.whl (4.1 kB)\n",
      "Collecting lancedb<0.6.0,>=0.5.1 (from llama-index-vector-stores-lancedb)\n",
      "  Downloading lancedb-0.5.7-py3-none-any.whl (115 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m115.1/115.1 kB\u001b[0m \u001b[31m3.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: llama-index-core<0.11.0,>=0.10.1 in /usr/local/lib/python3.10/dist-packages (from llama-index-vector-stores-lancedb) (0.10.37.post1)\n",
      "Collecting deprecation (from lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb)\n",
      "  Downloading deprecation-2.1.0-py2.py3-none-any.whl (11 kB)\n",
      "Collecting pylance==0.9.18 (from lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb)\n",
      "  Downloading pylance-0.9.18-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21.6 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.6/21.6 MB\u001b[0m \u001b[31m14.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting ratelimiter~=1.0 (from lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb)\n",
      "  Downloading ratelimiter-1.2.0.post0-py3-none-any.whl (6.6 kB)\n",
      "Collecting retry>=0.9.2 (from lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb)\n",
      "  Downloading retry-0.9.2-py2.py3-none-any.whl (8.0 kB)\n",
      "Requirement already satisfied: tqdm>=4.27.0 in /usr/local/lib/python3.10/dist-packages (from lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (4.66.4)\n",
      "Requirement already satisfied: pydantic>=1.10 in /usr/local/lib/python3.10/dist-packages (from lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (2.7.1)\n",
      "Requirement already satisfied: attrs>=21.3.0 in /usr/local/lib/python3.10/dist-packages (from lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (23.2.0)\n",
      "Collecting semver>=3.0 (from lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb)\n",
      "  Downloading semver-3.0.2-py3-none-any.whl (17 kB)\n",
      "Requirement already satisfied: cachetools in /usr/local/lib/python3.10/dist-packages (from lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (5.3.3)\n",
      "Requirement already satisfied: pyyaml>=6.0 in /usr/local/lib/python3.10/dist-packages (from lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (6.0.1)\n",
      "Requirement already satisfied: click>=8.1.7 in /usr/local/lib/python3.10/dist-packages (from lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (8.1.7)\n",
      "Requirement already satisfied: requests>=2.31.0 in /usr/local/lib/python3.10/dist-packages (from lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (2.31.0)\n",
      "Collecting overrides>=0.7 (from lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb)\n",
      "  Downloading overrides-7.7.0-py3-none-any.whl (17 kB)\n",
      "Requirement already satisfied: pyarrow>=12 in /usr/local/lib/python3.10/dist-packages (from pylance==0.9.18->lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (14.0.2)\n",
      "Requirement already satisfied: numpy>=1.22 in /usr/local/lib/python3.10/dist-packages (from pylance==0.9.18->lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (1.25.2)\n",
      "Requirement already satisfied: SQLAlchemy[asyncio]>=1.4.49 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (2.0.30)\n",
      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.6 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (3.9.5)\n",
      "Requirement already satisfied: dataclasses-json in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (0.6.6)\n",
      "Requirement already satisfied: deprecated>=1.2.9.3 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.2.14)\n",
      "Requirement already satisfied: dirtyjson<2.0.0,>=1.0.8 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.0.8)\n",
      "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (2023.6.0)\n",
      "Requirement already satisfied: httpx in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (0.27.0)\n",
      "Requirement already satisfied: jsonpath-ng<2.0.0,>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.6.1)\n",
      "Requirement already satisfied: llamaindex-py-client<0.2.0,>=0.1.18 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (0.1.19)\n",
      "Requirement already satisfied: nest-asyncio<2.0.0,>=1.5.8 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.6.0)\n",
      "Requirement already satisfied: networkx>=3.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (3.3)\n",
      "Requirement already satisfied: nltk<4.0.0,>=3.8.1 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (3.8.1)\n",
      "Requirement already satisfied: openai>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.30.1)\n",
      "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (2.0.3)\n",
      "Requirement already satisfied: pillow>=9.0.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (9.4.0)\n",
      "Requirement already satisfied: spacy<4.0.0,>=3.7.1 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (3.7.4)\n",
      "Requirement already satisfied: tenacity<9.0.0,>=8.2.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (8.3.0)\n",
      "Requirement already satisfied: tiktoken>=0.3.3 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (0.7.0)\n",
      "Requirement already satisfied: typing-extensions>=4.5.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (4.11.0)\n",
      "Requirement already satisfied: typing-inspect>=0.8.0 in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (0.9.0)\n",
      "Requirement already satisfied: wrapt in /usr/local/lib/python3.10/dist-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.14.1)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.3.1)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.4.1)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (6.0.5)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.9.4)\n",
      "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (4.0.3)\n",
      "Requirement already satisfied: ply in /usr/local/lib/python3.10/dist-packages (from jsonpath-ng<2.0.0,>=1.6.0->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (3.11)\n",
      "Requirement already satisfied: anyio in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (3.7.1)\n",
      "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (2024.2.2)\n",
      "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.0.5)\n",
      "Requirement already satisfied: idna in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (3.7)\n",
      "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.3.1)\n",
      "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (0.14.0)\n",
      "Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk<4.0.0,>=3.8.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.4.2)\n",
      "Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk<4.0.0,>=3.8.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (2023.12.25)\n",
      "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai>=1.1.0->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.7.0)\n",
      "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (0.6.0)\n",
      "Requirement already satisfied: pydantic-core==2.18.2 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (2.18.2)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (3.3.2)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (2.0.7)\n",
      "Requirement already satisfied: decorator>=3.4.2 in /usr/local/lib/python3.10/dist-packages (from retry>=0.9.2->lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb) (4.4.2)\n",
      "Collecting py<2.0.0,>=1.4.26 (from retry>=0.9.2->lancedb<0.6.0,>=0.5.1->llama-index-vector-stores-lancedb)\n",
      "  Downloading py-1.11.0-py2.py3-none-any.whl (98 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m98.7/98.7 kB\u001b[0m \u001b[31m11.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (3.0.12)\n",
      "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.0.5)\n",
      "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.0.10)\n",
      "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (2.0.8)\n",
      "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (3.0.9)\n",
      "Requirement already satisfied: thinc<8.3.0,>=8.2.2 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (8.2.3)\n",
      "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.1.2)\n",
      "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (2.4.8)\n",
      "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (2.0.10)\n",
      "Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (0.3.4)\n",
      "Requirement already satisfied: typer<0.10.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (0.9.4)\n",
      "Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (6.4.0)\n",
      "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (3.1.4)\n",
      "Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (67.7.2)\n",
      "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (24.0)\n",
      "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (3.4.0)\n",
      "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (3.0.3)\n",
      "Requirement already satisfied: mypy-extensions>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from typing-inspect>=0.8.0->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.0.0)\n",
      "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /usr/local/lib/python3.10/dist-packages (from dataclasses-json->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (3.21.2)\n",
      "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (2023.4)\n",
      "Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (2024.1)\n",
      "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.2.1)\n",
      "Requirement already satisfied: language-data>=1.2 in /usr/local/lib/python3.10/dist-packages (from langcodes<4.0.0,>=3.2.0->spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.2.0)\n",
      "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.16.0)\n",
      "Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.10/dist-packages (from thinc<8.3.0,>=8.2.2->spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (0.7.11)\n",
      "Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from thinc<8.3.0,>=8.2.2->spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (0.1.4)\n",
      "Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from weasel<0.4.0,>=0.1.0->spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (0.16.0)\n",
      "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (2.1.5)\n",
      "Requirement already satisfied: marisa-trie>=0.7.7 in /usr/local/lib/python3.10/dist-packages (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spacy<4.0.0,>=3.7.1->llama-index-core<0.11.0,>=0.10.1->llama-index-vector-stores-lancedb) (1.1.1)\n",
      "Installing collected packages: ratelimiter, semver, py, overrides, deprecation, retry, pylance, lancedb, llama-index-vector-stores-lancedb\n",
      "Successfully installed deprecation-2.1.0 lancedb-0.5.7 llama-index-vector-stores-lancedb-0.1.3 overrides-7.7.0 py-1.11.0 pylance-0.9.18 ratelimiter-1.2.0.post0 retry-0.9.2 semver-3.0.2\n"
     ]
    }
   ],
   "source": [
    "# install dependencies\n",
    "%pip install llama-index llama-index-core llama-index-embeddings-openai llama-parse\n",
    "%pip install llama-index-postprocessor-flag-embedding-reranker\n",
    "%pip install git+https://github.com/FlagOpen/FlagEmbedding.git\n",
    "%pip install llama-index-vector-stores-lancedb\n",
    "%pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-openai langchain-chroma bs4 lancedb\n",
    "%pip  install unstructured"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "NqK1g8Bg7zlZ"
   },
   "outputs": [],
   "source": [
    "# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio\n",
    "import os\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()\n",
    "\n",
    "# API access to llama-cloud\n",
    "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-...\"\n",
    "# Using OpenAI API for embeddings/llms\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-proj-...\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4OmWRDtAKONC"
   },
   "source": [
    "### Download the PDF (contains both tables & text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "smCjT2FIj9Fo"
   },
   "outputs": [],
   "source": [
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_march_2022.pdf' -O './uber_10q_march_2022.pdf'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1I--ouSiTGvj"
   },
   "source": [
    "# 1. Langchain with Q&A on PDF"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "4ysMDhHiR2bG"
   },
   "outputs": [],
   "source": [
    "# import modules\n",
    "\n",
    "import bs4\n",
    "from langchain import hub\n",
    "from langchain_community.document_loaders import WebBaseLoader\n",
    "from langchain_openai import ChatOpenAI\n",
    "from langchain_community.document_loaders import PyPDFLoader\n",
    "from langchain.vectorstores import LanceDB\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.runnables import RunnablePassthrough\n",
    "from langchain_openai import OpenAIEmbeddings\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter"
   ]
  },
  {
   "cell_type": "code",
   "source": [
    "llm = ChatOpenAI(model=\"gpt-3.5-turbo-0125\")\n",
    "\n",
    "loader = PyPDFLoader(\"/content/uber_10q_march_2022.pdf\")\n",
    "docs = loader.load()\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
    "splits = text_splitter.split_documents(docs)\n",
    "\n",
    "# LanceDB as retriever\n",
    "vectorstore = LanceDB.from_documents(documents=splits, embedding=OpenAIEmbeddings())\n",
    "\n",
    "# Retrieve and generate using the relevant snippets of the blog.\n",
    "retriever = vectorstore.as_retriever()\n",
    "prompt = hub.pull(\"rlm/rag-prompt\")\n",
    "\n",
    "\n",
    "def format_docs(docs):\n",
    "    return \"\\n\\n\".join(doc.page_content for doc in docs)"
   ],
   "metadata": {
    "id": "2psX-a_6B-ZL"
   },
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "source": [
    "# retriever chain\n",
    "rag_chain = (\n",
    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
    "    | prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")"
   ],
   "metadata": {
    "id": "8GiGkZ9BCCIs"
   },
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 54
    },
    "id": "4emkzsCqSMTe",
    "outputId": "d122f7f4-9072-4a4f-acf8-3d3b39755328"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "'The net loss value attributable to Uber Technologies, Inc. for the period was $5.9 billion, compared to $108 million in the same period the previous year. This represents a significant increase in net loss year-over-year.'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# prompt\n",
    "qa_langchain_query1 = (\n",
    "    \" what is the net loss value attributable to Uber compared to last year?\"\n",
    ")\n",
    "rag_chain.invoke(qa_langchain_query1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 36
    },
    "id": "G4qR6GxTSMWO",
    "outputId": "38fff0fe-377f-4b10-aa4f-362712e539d9"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "\"I don't know.\""
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# prompt\n",
    "qa_langchain_query2 = \"how is the Cash paid for Income taxes, net of refunds from Supplemental disclosures of cash flow information?\"\n",
    "rag_chain.invoke(qa_langchain_query2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 36
    },
    "id": "b3DM-lCnSMZG",
    "outputId": "4c97e6f3-d610-447f-ba4f-62715ae274a1"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "\"I don't have detailed charts of intangible assets, net as of December 31, 2021 and March 31, 2022.\""
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# prompt\n",
    "qa_langchain_query3 = \"give me detailed charts of intangible assets, net as of December 31, 2021 and March 31, 2022\"\n",
    "rag_chain.invoke(qa_langchain_query3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pwZzEShGwwvT"
   },
   "source": [
    "FOR QUERY 2 & QUERY 3, We didn't got any output.\n",
    "\n",
    "**LETS TRY TO DO IT WITH LLAMAINDEX**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GVD2sPBEcRE3"
   },
   "source": [
    "# 2 . Llamaindex with Q&A on PDF"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "H__qJIWYdmgY"
   },
   "outputs": [],
   "source": [
    "# import modules\n",
    "import textwrap\n",
    "from llama_index.vector_stores.lancedb import LanceDBVectorStore\n",
    "from llama_index.core import SimpleDirectoryReader, Document, StorageContext\n",
    "from llama_index.core import VectorStoreIndex\n",
    "from llama_index.llms.openai import OpenAI\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "from llama_index.core import VectorStoreIndex\n",
    "from llama_index.core import SimpleDirectoryReader\n",
    "from llama_index.postprocessor.flag_embedding_reranker import (\n",
    "    FlagEmbeddingReranker,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "source": [
    "from llama_index.core import SimpleDirectoryReader\n",
    "from llama_index.postprocessor.flag_embedding_reranker import (\n",
    "    FlagEmbeddingReranker,\n",
    ")\n",
    "\n",
    "# data loading in llamaindex\n",
    "reader = SimpleDirectoryReader(input_dir=\"/content/data_pdf/\")\n",
    "\n",
    "documents_pdf_loader = reader.load_data()"
   ],
   "metadata": {
    "id": "OWZq0upVC017"
   },
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "BCNNrAw9Dklk"
   },
   "outputs": [],
   "source": [
    "from llama_index.vector_stores.lancedb import LanceDBVectorStore\n",
    "\n",
    "# LanceDB as retriever\n",
    "vector_store_pdf = LanceDBVectorStore(uri=\"/tmp/lancedb_lamaindex\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "1BLK8QPhcyMh"
   },
   "outputs": [],
   "source": [
    "storage_context_pdf = StorageContext.from_defaults(vector_store=vector_store_pdf)\n",
    "lance_index_pdf = VectorStoreIndex.from_documents(\n",
    "    documents_pdf_loader, storage_context=storage_context_pdf\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "source": [
    "# reranker\n",
    "reranker = FlagEmbeddingReranker(\n",
    "    top_n=5,\n",
    "    model=\"BAAI/bge-reranker-large\",\n",
    ")"
   ],
   "metadata": {
    "id": "xkUhLDPNC7dB"
   },
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "source": [
    "# index\n",
    "lance_index_query_pdf = lance_index_pdf.as_query_engine(\n",
    "    similarity_top_k=10, node_postprocessors=[reranker]\n",
    ")\n",
    "\n",
    "# query\n",
    "qa_lama_query1 = \"how is the Cash paid for Income taxes, net of refunds from Supplemental disclosures of cash flow information?\"\n",
    "output1 = lance_index_query_pdf.query(qa_lama_query1)\n",
    "print(output1.response)"
   ],
   "metadata": {
    "id": "Ko3bJM-HC-OO"
   },
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "oCt6P6WCfTVV",
    "outputId": "99bf8c85-56d8-4df5-d2b8-bf2b4439eeb9"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The net loss attributable to Uber Technologies, Inc. was $5.9 billion in the current period, compared to a net loss of $108 million in the same period last year.\n"
     ]
    }
   ],
   "source": [
    "# query\n",
    "qa_lama_query2 = (\n",
    "    \" what is the net loss value attributable to Uber compared to last year?\"\n",
    ")\n",
    "output2 = Lance_index_query_pdf.query(qa_lama_query2)\n",
    "print(output2.response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "M5RqjSoVcyVh",
    "outputId": "1b6bc4d9-1dd2-4f34-a147-33323a2906a5"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The detailed charts of intangible assets, net as of December 31, 2021 and March 31, 2022 are as follows:\n",
      "\n",
      "**As of December 31, 2021:**\n",
      "- Consumer, Merchant and other relationships: $1,574 million\n",
      "- Developed technology: $653 million\n",
      "- Trade names and trademarks: $175 million\n",
      "- Patents: $8 million\n",
      "- Other: $2 million\n",
      "- Total Intangible assets: $2,412 million\n",
      "\n",
      "**As of March 31, 2022:**\n",
      "- Consumer, Merchant and other relationships: $1,494 million\n",
      "- Developed technology: $599 million\n",
      "- Trade names and trademarks: $167 million\n",
      "- Patents: $7 million\n",
      "- Other: $2 million\n",
      "- Total Intangible assets: $2,269 million\n"
     ]
    }
   ],
   "source": [
    "# query\n",
    "qa_lama_query3 = \"give me detailed charts of intangible assets, net as of December 31, 2021 and March 31, 2022\"\n",
    "output3 = Lance_index_query_pdf.query(qa_lama_query3)\n",
    "print(output3.response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-wbk2rFy0iOZ",
    "outputId": "f5b59d01-e78f-477c-bbb7-f4210c6b7232"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Adjusted EBITDA for 2021 was a loss of $359 million, while for 2022 it improved to $168 million. Interest expense for the period increased from $115 million in 2021 to $129 million in 2022.\n"
     ]
    }
   ],
   "source": [
    "# query\n",
    "qa_lama_query4 = \"what is Adjusted EBITDA 2021 vs 2022 ? what is intreset  expense\"\n",
    "output2 = Lance_index_query_pdf.query(qa_lama_query4)\n",
    "print(output2.response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zFgLYsCkcmUH"
   },
   "source": [
    "# 3 Llamaparser with Langchain on PDF\n",
    "\n",
    "we are simply saving all llamaparser output in .md file & based on that we are doing Q& A. there are better methods also to add llamaparser with Langchain **but lets do this experiment**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "QRne5LQsKugl",
    "outputId": "6ffe8b7e-41ac-4d9c-8993-b23db079b3d5"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Started parsing the file under job_id 6aef2258-0e5c-4796-8ea1-f82e817cb542\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "from llama_parse import LlamaParse\n",
    "\n",
    "# Ensure the data folder exists\n",
    "if not os.path.exists(\"data\"):\n",
    "    os.makedirs(\"data\")\n",
    "\n",
    "# Load data using LlamaParse\n",
    "documents_LlamaParse = LlamaParse(result_type=\"markdown\").load_data(\n",
    "    \"/content/uber_10q_march_2022.pdf\"\n",
    ")\n",
    "\n",
    "# Open the file in append mode ('a') and write the content\n",
    "with open(\"data/output.md\", \"a\") as f:  # Open the file in append mode ('a')\n",
    "    for doc in documents_LlamaParse:\n",
    "        f.write(doc.text + \"\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Jg1y4lO58FDB"
   },
   "outputs": [],
   "source": [
    "import bs4\n",
    "from langchain import hub\n",
    "from langchain_community.document_loaders import WebBaseLoader\n",
    "from langchain_chroma import Chroma\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.runnables import RunnablePassthrough\n",
    "from langchain_openai import OpenAIEmbeddings\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "QENCaglMjher",
    "outputId": "bb1513e4-33c0-46cb-937b-c2700a796fd3"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n",
      "  0%|          | 0/1 [00:00<?, ?it/s]\u001b[A[nltk_data] Downloading package averaged_perceptron_tagger to\n",
      "[nltk_data]     /root/nltk_data...\n",
      "[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.\n",
      "\n",
      "100%|██████████| 1/1 [00:12<00:00, 12.93s/it]\n"
     ]
    }
   ],
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "from langchain_community.document_loaders import DirectoryLoader\n",
    "\n",
    "llm = ChatOpenAI(model=\"gpt-3.5-turbo-0125\")\n",
    "\n",
    "loader = DirectoryLoader(\"/content/data\", glob=\"**/*.md\", show_progress=True)\n",
    "documents = loader.load()\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=300)\n",
    "docs = text_splitter.split_documents(documents)\n",
    "\n",
    "\n",
    "# print(docs[])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-SoJIIOvoDzr",
    "outputId": "030722f4-b1ff-4cba-8827-4cb2d09ffb83"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(page_content='Document\\n\\nUNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549\\n\\nFORM 10-Q\\n\\n(Mark One)\\n\\nQUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the quarterly period ended March 31, 2022\\n\\nCommission File Number: 001-38902\\n\\nUBER TECHNOLOGIES, INC.\\n\\n(Exact name of registrant as specified in its charter)\\n\\nDelaware 45-2647441\\n\\n(State or other jurisdiction of incorporation or organization) 1515 3rd Street (I.R.S. Employer Identification No.)\\n\\nSan Francisco, California 94158\\n\\n(Address of principal executive offices, including zip code) (415) 612-8582\\n\\n(Registrant’s telephone number, including area code)\\n\\nSecurities registered pursuant to Section 12(b) of the Act:\\n\\nTitle of each class Trading Symbol(s) Name of each exchange on which registered Common Stock, par value $0.00001 per share UBER New York Stock Exchange\\n\\nIndicate by check mark whether the registrant (1) has filed all reports required to be filed by Section 13 or 15(d) of the Securities Exchange Act of 1934 during the preceding 12 months (or for such shorter period that the registrant was required to file such reports), and (2) has been subject to such filing requirements for the past 90 days. Yes ☒ No ☐\\n\\nIndicate by check mark whether the registrant has submitted electronically every Interactive Data File required to be submitted pursuant to Rule 405 of Regulation S-T (§232.405 of this chapter) during the preceding 12 months (or for such shorter period that the registrant was required to submit such files). Yes ☒ No ☐\\n\\nIndicate by check mark whether the registrant is a large accelerated filer, an accelerated filer, a non-accelerated filer, a smaller reporting company, or an emerging growth company. See the definitions of “large accelerated filer,” “accelerated filer,” “smaller reporting company,” and “emerging growth company” in Rule 12b-2 of the Exchange Act.', metadata={'source': '/content/data/output.md'})"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "docs[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "gIOg35WpjEm2"
   },
   "outputs": [],
   "source": [
    "from langchain.vectorstores import LanceDB\n",
    "\n",
    "# LanceDB as retriever\n",
    "vectorstore = LanceDB.from_documents(documents=docs, embedding=OpenAIEmbeddings())\n",
    "\n",
    "# Retrieve and generate using the relevant snippets of the blog.\n",
    "retriever = vectorstore.as_retriever()\n",
    "prompt = hub.pull(\"rlm/rag-prompt\")\n",
    "\n",
    "\n",
    "def format_docs(docs):\n",
    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
    "\n",
    "\n",
    "rag_chain_lama = (\n",
    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
    "    | prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 54
    },
    "id": "_W6S8msJjEq5",
    "outputId": "1e9ce002-3278-4c5f-b0be-6fe592004b58"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "'The net loss attributable to Uber Technologies, Inc. for the first quarter of 2022 was $5.9 billion, compared to a net loss of $108 million in the same period in 2021. This represents a significant increase in net loss compared to the previous year.'"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# query\n",
    "query1 = \" what is the net loss value attributable to Uber compared to last year?\"\n",
    "rag_chain_lama.invoke(query1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 36
    },
    "id": "_2Q_n_-wjEtV",
    "outputId": "a0e8db08-a688-4c8d-9025-e8808cf80d7a"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "'The Cash paid for Income taxes, net of refunds is not specifically mentioned in the provided context.'"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# query\n",
    "query2 = \"how is the Cash paid for Income taxes, net of refunds from Supplemental disclosures of cash flow information?\"\n",
    "rag_chain_lama.invoke(query2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 90
    },
    "id": "0z8sapoziX7f",
    "outputId": "ae0e0d32-330d-4aa3-f274-a4ad455db430"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "'Detailed charts of intangible assets, net as of December 31, 2021 and March 31, 2022, are as follows:\\n\\nAs of December 31, 2021:\\n- Consumer, Merchant and other relationships: $1,494 million\\n- Developed technology: $599 million\\n- Trade names and trademarks: $167 million\\n- Patents: $7 million\\n- Other: $2 million\\n\\nAs of March 31, 2022:\\n- Consumer, Merchant and other relationships: $1,574 million\\n- Developed technology: $653 million\\n- Trade names and trademarks: $175 million\\n- Patents: $8 million\\n- Other: $2 million'"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# query\n",
    "query3 = \"give me detailed charts of intangible assets, net as of December 31, 2021 and March 31, 2022\"\n",
    "rag_chain_lama.invoke(query3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LTGUyJwxeF4Y"
   },
   "source": [
    "Till now, We are not getting all answers but now lets try the llamaparser with Llamaindex to see the results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MqzD4Rh4eCrE"
   },
   "source": [
    "# 4. Llamaparse with Llamaindex on PDF"
   ]
  },
  {
   "cell_type": "code",
   "source": [
    "# using llamaparser with LlamaIndex\n",
    "\n",
    "from llama_index.postprocessor.flag_embedding_reranker import (\n",
    "    FlagEmbeddingReranker,\n",
    ")\n",
    "from llama_parse import LlamaParse\n",
    "\n",
    "pdf_table_LlamaParse = LlamaParse(result_type=\"markdown\").load_data(\n",
    "    \"/content/data_pdf/uber_10q_march_2022.pdf\"\n",
    ")"
   ],
   "metadata": {
    "id": "b3G5NejJDdTe"
   },
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "source": [
    "# LanceDB as retriever\n",
    "vector_store_lamaparser = LanceDBVectorStore(uri=\"/tmp/lancedb_parser\")\n",
    "storage_context_lamaparser = StorageContext.from_defaults(\n",
    "    vector_store=vector_store_lamaparser\n",
    ")"
   ],
   "metadata": {
    "id": "ee7p1axsDmqY"
   },
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "source": [
    "lance_index_lamaparser = VectorStoreIndex.from_documents(\n",
    "    pdf_table_LlamaParse, storage_context=storage_context_lamaparser\n",
    ")\n",
    "\n",
    "# reranker\n",
    "reranker = FlagEmbeddingReranker(\n",
    "    top_n=5,\n",
    "    model=\"BAAI/bge-reranker-large\",\n",
    ")\n",
    "\n",
    "# index\n",
    "Lance_index_query_lamaparser = lance_index_lamaparser.as_query_engine(\n",
    "    similarity_top_k=10, node_postprocessors=[reranker]\n",
    ")"
   ],
   "metadata": {
    "id": "18VzFSC8DpXc"
   },
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "h2lnfckunHLc",
    "outputId": "a248807f-aaa3-4d45-afd9-e9365b9eb6ac"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Started parsing the file under job_id 69ea99de-dd19-46db-86d5-e03d2027a7ce\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Adjusted EBITDA improved by $527 million from a loss of $359 million in 2021 to $168 million in 2022. Interest expense increased by an immaterial amount.\n"
     ]
    }
   ],
   "source": [
    "# query\n",
    "query_parser1 = \"what is Adjusted EBITDA 2021 vs 2022 ? what is intreset  expense\"\n",
    "\n",
    "response_1 = Lance_index_query_lamaparser.query(query_parser1)\n",
    "print(response_1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-ET0OUt1y_hj",
    "outputId": "b8347392-e54e-400f-af35-363a5eb87b6a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "| |Gross Carrying Value|Accumulated Amortization|Net Carrying Value|Useful Life - Years|\n",
      "|---|---|---|---|---|\n",
      "|Consumer, Merchant and other relationships|$1,868|$(294)|$1,574|9|\n",
      "|Developed technology|$922|$(269)|$653|5|\n",
      "|Trade names and trademarks|$222|$(47)|$175|6|\n",
      "|Patents|$15|$(7)|$8|7|\n",
      "|Other|$5|$(3)|$2|0|\n",
      "\n",
      "| |Gross Carrying Value|Accumulated Amortization|Net Carrying Value|Useful Life - Years|\n",
      "|---|---|---|---|---|\n",
      "|Consumer, Merchant and other relationships|$1,856|$(362)|$1,494|9|\n",
      "|Developed technology|$924|$(325)|$599|5|\n",
      "|Trade names and trademarks|$222|$(55)|$167|6|\n",
      "|Patents|$15|$(8)|$7|6|\n",
      "|Other|$5|$(3)|$2|0|\n"
     ]
    }
   ],
   "source": [
    "# query\n",
    "query_parser2 = \"give me detailed charts of intangible assets, net as of December 31, 2021 and March 31, 2022\"\n",
    "\n",
    "response_1 = Lance_index_query_lamaparser.query(query_parser2)\n",
    "print(response_1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "a1zhm2CxRykD",
    "outputId": "3243c11c-d789-41f8-ed3f-7c02ea5edae0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "***********New pdf+ lancedb ***********\n",
      "The cash paid for income taxes, net of refunds, was $22 million for the three months ended March 31, 2021, and $41 million for the three months ended March 31, 2022.\n"
     ]
    }
   ],
   "source": [
    "qa_lama_query4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rXTsFl7_efUJ"
   },
   "source": [
    "**WOW 🤩 the answers are more clear & better compaired to other methods .thats power of Llamaparser .this Llamaparser with LlamaIndex is doing quite well on table & text data from PDF.**\n"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}